Prediction of protein subcellular locations using fuzzy k-NN method

نویسندگان

  • Ying Huang
  • Yanda Li
چکیده

MOTIVATION Protein localization data are a valuable information resource helpful in elucidating protein functions. It is highly desirable to predict a protein's subcellular locations automatically from its sequence. RESULTS In this paper, fuzzy k-nearest neighbors (k-NN) algorithm has been introduced to predict proteins' subcellular locations from their dipeptide composition. The prediction is performed with a new data set derived from version 41.0 SWISS-PROT databank, the overall predictive accuracy about 80% has been achieved in a jackknife test. The result demonstrates the applicability of this relative simple method and possible improvement of prediction accuracy for the protein subcellular locations. We also applied this method to annotate six entirely sequenced proteomes, namely Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Oryza sativa, Arabidopsis thaliana and a subset of all human proteins. AVAILABILITY Supplementary information and subcellular location annotations for eukaryotes are available at http://166.111.30.65/hying/fuzzy_loc.htm

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prediction of protein subcellular locations using Markov chain models.

A novel method was introduced to predict protein subcellular locations from sequences. Using sequence data, this method achieved a prediction accuracy higher than previous methods based on the amino acid composition. For three subcellular locations in a prokaryotic organism, the overall prediction accuracy reached 89.1%. For eukaryotic proteins, prediction accuracies of 73.0% and 78.7% were att...

متن کامل

Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method

Amphiphilic pseudo-amino acid composition (Am-Pse-AAC) with extra sequence-order information is a useful feature for representing enzymes. This study first utilizes the k-nearest neighbor (k-NN) rule to analyze the distribution of enzymes in the Am-Pse-AAC feature space. This analysis indicates the distributions of multiple classes of enzymes are highly overlapped. To cope with the overlap prob...

متن کامل

Prediction of protein subcellular multisite localization using a new feature extraction method.

A basic problem of proteomics is identifying the subcellular locations of a protein. One factor making the problem more complicated is that some proteins may simultaneously exist in two or more than two subcellular locations. To improve multisite prediction quality, it is necessary to use effective feature extraction methods. Here, we developed a new feature extraction method based on the pK va...

متن کامل

Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs

MOTIVATION The subcellular location of a protein is closely correlated to its function. Thus, computational prediction of subcellular locations from the amino acid sequence information would help annotation and functional prediction of protein coding genes in complete genomes. We have developed a method based on support vector machines (SVMs). RESULTS We considered 12 subcellular locations in...

متن کامل

مقایسه عملکرد مدل کاکس و روش K ـ نزدیکترین همسایگی در تخمین بقای بیماران پیوند کلیه

Introduction & Objective: Cox model is a common method to estimate survival and validity of the results is dependent on the proportional hazards assumption. K- Nearest neighbor is a nonparametric method for survival probability in heterogeneous communities. The purpose of this study was to compare the performance of k- nearest neighbor method (K-NN) with Cox model. Materials & Methods: This ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 20 1  شماره 

صفحات  -

تاریخ انتشار 2004